class: center, middle, inverse, title-slide .title[ # Lecture 20 ] .subtitle[ ## Multiple Linear Regression ] .author[ ### Psych 10 C ] .institute[ ### University of California, Irvine ] .date[ ### 05/16/2022 ] --- ## Review - Today, we will continue working with the mental rotation example. -- - We are interested in how the time it takes (response time) for participants to identify a geometrical object varies with the angle of rotation of that object. -- - The results from our model comparisons showed that the "angle of rotation" was a good predictor of the response time of participants in the task. -- - This predictor accounted for 95% of the total variability in our observations in comparison to the Null model. -- - However, when we looked at the difference between observation and prediction we noticed that there was a pattern, where the model was underestimating the response time of younger participants. --- ## Difference between observation and prediction - When we graph the difference between observation and prediction as a function of age (age being the variable on the *x-axis*), we saw a trend were the model had **on average** more positive errors for young participants. -- <img src="data:image/png;base64,#lec-20_files/figure-html/epsilon-age-lm-1.png" style="display: block; margin: auto;" /> --- ## Simple linear regression <img src="data:image/png;base64,#lec-20_files/figure-html/rotation-colorage-1.png" style="display: block; margin: auto;" /> --- ## Adding a new predictor - There is a lot of variability in our observations, but we do see that older participants seem to have faster repnse times in comparison with younger participants. -- - Now we will include the predictor **age** to our model so that we can make a comparison between a multiple linear regression and the simple linear regression which only includes the **angle of rotation** as a predictor. -- - This model will be formalized as follows: `$$y_i \sim \text{Normal}(\beta_0 + \beta_1x_{i1} + \beta_2x_{i2}, \sigma_m^2)$$` -- - This means that now we need to estimate the value of 3 parameters: -- - The intercept `\(\beta_0\)`. -- - The slope associated to changes in the angle of rotation `\(\beta_1\)`. -- - The slope associated to changes in the age of participants `\(\beta_2\)` --- ## Predictions of the multiple linear regression - Given that now there are two slopes, the model will make a different prediction for each combination of our independent variables. -- - This is similar to the idea of a factorial design, where the additive model could make different prediction for each combinations of the levels of our factors in a study. -- - However, this time instead of having levels of a factor, what we have are continuous values of two independent variables. -- - This means that we will have a predicted response time for a participant who is 10 years old and is responding to a figure that has been rotated 10 degrees, and this prediction will be different from a participant who is 11 years old and is responfing to a figure that was rotated 11 degrees. -- - With the variables in our example, we can write the predicted response time: `$$\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1\text{angle}_i + \hat{\beta}_2\text{age}_i$$` --- ## Model predictions - We have just introduced a new notation. When we want to denote the prediction of a linear model about a single observation (a single row in our file) we denote that by using a "y" with a "hat". -- - Remember that whenever we use "hats" that means that the variable is an estimator (or a statistic) and not an observation in the study. -- - In other words, `\(y\)` and `\(\hat{y}\)` are not the same! -- - We use this notation to introduce the error of the models, also known as "epsilon": `$$\hat{\epsilon}_i = y_i - \hat{y}_i$$` -- - We use `\(\hat{y}\)` to denote the prediction of a model so that we don't have to write the whole equation: `$$\hat{\epsilon}_i = y_i - \hat{y}_i = y_i - \hat{\beta}_0 - \hat{\beta}_1x_{i1} - \hat{\beta}_2x{i2}$$` --- ## Model errors - This new notation will also allow us to write the squared error by observation: `$$\hat{\epsilon}_i^2 = (y_i - \hat{y}_i)^2$$` -- - `\(\hat{\epsilon}_i^2\)` is the variable that we have been adding to our data with the name "**error_model**" for all the examples in the class, and it represents the difference in squared units between the observation and prediction. -- - If we **add** all of the values of `\(\hat{\epsilon}_i^2\)` we get the Sum of Squared Errors: `$$SSE = \sum_{i = 1}^{n} \hat{\epsilon}_i^2$$` -- - Now we have the notation that we need to refer to all the elements that we typically calculate in order to compare our models. -- - Notice that these are not new variables, they are the same ones that we have been using in all problems. The only difference is that now we have introduced notation for them. --- ## Multiple linear regression in R - Now to get our estimates of the intercept `\((\beta_0)\)`, the slope for angle of rotation `\((\beta_1)\)` and the slope associated with the participant age `\((\beta_2)\)` we can use the following code: -- ```r betas <- lm(formula = response_time ~ angle + age, data = mental_rotation)$coef ``` -- - The estimated values were equal to: | Estimator | Value | |-----------------|:---------------------:| | `\(\hat{\beta}_0\)` | 597.09 | | `\(\hat{\beta}_1\)` | 15.09 | | `\(\hat{\beta}_2\)` | -10.37 | --- ## Interpreting the estimates of `\(\beta\)` - Now we can interpret each of the estimated values as follows: -- - `\((\beta_0)\)`: The estimated value of the estimate was 597.09, which is the expected response time when the other variables take the value of 0. -- - `\((\beta_1)\)`: The estimated value of the slope associated with angle of rotation was equal to 15.09, this means that the average response time increases by approximately 15 milliseconds for each additional degree by which a figure is rotated. -- - `\((\beta_2)\)`: The estimated value of the slope associated with the age of participants was equal to -10.37, this means that the average response time decreases by approximately -10 milliseconds on average for each year that a participants age increases by. -- - Now we can add the predictions of our model `\((\hat{y}_i)\)`, the errors `\((\hat{\epsilon}_i)\)` and the squared errors `\((\hat{\epsilon}^2_i)\)` to our data, and compare our new model with the two that we had previously. --- ## Adding predictions and errors - The code that we need to use to introduce add our predictions and errors to the data is the same as with the linear regression model, we just need to add the product of `\(\hat{\beta}_2\)` and multiply it by the age of each participant. -- ```r # Adding prediction, difference between outcome and prediction and # squared error by observation mental_rotation <- mental_rotation %>% mutate("prediction_angle_age" = betas[1] + betas[2] * angle + betas[3] * age, "epsilon_angle_age" = response_time - prediction_angle_age, "error_angle_age" = (epsilon_angle_age)^2) # Sum of Squared Errors sse_angle_age <- sum(mental_rotation$error_angle_age) # Mean Squared Error mse_angle_age <- 1/n_total * sse_angle_age # Proportion of variance accounted for by Multiple linear regression r2_angle_age <- (sse_null - sse_angle_age)/sse_null # BIC of Multiple Linear Regression bic_angle_age <- n_total * log(mse_angle_age) + 3 * log(n_total) ``` --- ## Model comparison - Now we have all the values that we need to compare the 3 models that we have talked about: -- - Null: assumes that the response time of particpants is constant regardless of age or angle of rotation of the figure. -- - Simple linear regression: assumes that only the number of degrees that a figure has been rotated by has an effect on the expected response time. -- - Multiple linear regression: assumes that response times change depending on the angle of rotation of a figure and the age of the participants. -- - Another way of saying this is: "response times are a **linear function** of the angle of rotation of a figure and the age of the participant". --- ## Model comparison - We will sumarize the results using a table: -- | Model | Parameters | MSE | `\(R^2\)` | BIC | |-------|:----------:|:---------------------:|:-----:|:--------------------:| | Null | 1 | `\(8.8083\times 10^{5}\)` | NA | 4112| | Angle of Rotation | 2 | `\(4.2866\times 10^{4}\)` | 0.951 | 3211| | Angle of Rotation & Age | 3 | `\(4.0933\times 10^{4}\)` | 0.954 | 3203| -- - Taking the squared root of the MSE values on the table we can see that the average error of the model that only takes angle of rotation into account is 207 milliseconds, adding age reduced the average error to 202 milliseconds. -- - This is a small change in error which is reflected on the estimated value of `\(R^2\)` which tells us that the model that takes both angle of rotation and age into account only explains an additional 0.3% of the total variability in comparison to the simple linear regression. -- - Nevertheless, the values obtained for the BIC indicate that we can improve our predictions by adding the age of the participants as a predictor in the model. --- # Interpretation - In summary, the results from the model comparisons indicate that the model that assumes that the expected response time in the experiment is a **linear function** of the angle of rotation of a figure and the age of the participant. -- - According to the model, the average response time increases with the degrees that a figure is rotated, furthermore, older participants seem to have lower reponse times on average for figures wit the same angle of rotation. -- - Now we can avaluate if the multiple linear regression has the properties that we where looking for: -- - Errors follow approximately a Normal distribution centered at 0. -- - There are no patterns in the errors when graphed against other variables. --- ## Histogram: `\(\hat{\epsilon}_i\)` .pull-left[ ```r ggplot(data = mental_rotation) + #BREAK aes(x = epsilon_angle_age) + #BREAK geom_histogram(color = "#E72F52", fill = "#E72F52", binwidth = 40, aes(y =..density..)) + #BREAK theme_classic() + #BREAK xlab("Response time - Prediction") + ylab("") + theme(axis.title.x = element_text(size = 30)) + #BREAK stat_function(fun = dnorm, args = list(mean = 0, sd = sqrt(mse_angle_age))) ``` ] .pull-right[ <img src="data:image/png;base64,#lec-20_files/figure-html/epsilon-hist-out-1.png" style="display: block; margin: auto;" /> ] -- - The errors seem to follow approximately a normal distribution centered around 0. --- ## Scatter-plot: `\(\hat{\epsilon}_i\)` .pull-left[ ```r ggplot(data = mental_rotation) + #BREAK aes(x = as.numeric(rownames(mental_rotation))) + #BREAK aes(y = epsilon_rotation) + #BREAK geom_point(color = "#E72F52", size = 4) + #BREAK theme_classic() + #BREAK xlab("Observation number") + ylab("Response time - Prediction") + theme(axis.title.x = element_text(size = 30), axis.title.y = element_text(size = 30)) + #BREAK geom_smooth(se = FALSE, size = 2, color = "#0d95d0") ``` ] .pull-right[ <img src="data:image/png;base64,#lec-20_files/figure-html/epsilon-index-out-1.png" style="display: block; margin: auto;" /> ] -- - There is no pattern in the errors when ploted against the observation number. --- ## Scatter-plot: `\(\hat{\epsilon}_i\)` .pull-left[ ```r ggplot(data = mental_rotation) + #BREAK aes(x = age) + #BREAK aes(y = epsilon_angle_age) + #BREAK geom_point(color = "#E72F52", size = 4) + #BREAK theme_classic() + #BREAK xlab("Participant age") + ylab("Response time - Prediction") + theme(axis.title.x = element_text(size = 30), axis.title.y = element_text(size = 30)) + #BREAK geom_smooth(se = FALSE, size = 2, color = "#0d95d0") ``` ] .pull-right[ <img src="data:image/png;base64,#lec-20_files/figure-html/epsilon-age-out-1.png" style="display: block; margin: auto;" /> ] -- - There is no longer a patter in the errors when graphed against the age of the participants (because we are now using age as a predictor). --- ## Model evaluation - Given that there are no obvious deviations on the errors from a Normal distribution centered at 0 and with a variance of `\(\hat{\sigma}^2_m\)`, and that there are no patterns when we graph the errors agains other variables in the study, we can be more "comfortable" with our choice of a linear model. -- - The best model according to our analysis was the multiple linear regression that introduces angle of rotation and participant age as predictors, therefore we can conclude that: -- - The model that better accounted for our observation was a multiple linear regression that includes angle of rotation and participant's age as predictors of response time in a mental rotation task. The results suggest that the average response time of participants increases with the angle of rotation and decreases with participant's age. The model can account for approximately 95% of the total variability in response times.